SQL Server 2008 R2 : Database Files and Filegroups (part 1)

7/7/2013 9:34:44 PM

Databases in SQL Server 2008 span at least two, and optionally several, database files. There must always be at least one file for data and one file for the transaction log. These database files are normal operating system files created in a directory within the operating system. These files are created when the database is created or when a database is expanded.

Each database file has the following set of properties:

A logical filename— This name is used for internal reference to the file.
A physical filename— This name is the actual physical pathname of the file.
An initial size— If no size is specified for primary data file, its initial size, by default, is the minimum size required to hold the contents of the model database.
An optional maximum size— A maximum file size limit can be specified.
A file growth increment— This amount is specified in megabytes or as a percentage.

The information and properties about each file for a database are stored in the database visible via the system catalog view called sys.database_files. This view exists in every database and contains information about each of the database files. The master database contains a similar view, sys.master_files, that contains file information for all databases within the SQL Server instance. Table 1 lists the most useful columns in the sys.database_files view.

Table 1. The sysfiles Table
Column Name	Description
file_id	A file identification number that is unique within each database
file_guid	GUID for the file
type	File type (0=rows [that is, data files], 1=log, 2=FILESTREAM, 4=Full-text catalogs prior to SQL Server 2008
type_desc	Description of the file type (ROWS, LOG, FILESTREAM, FULLTEXT)
data_space_id	0 represents a log file; values > 0 represent the ID of the filegroup the data file belongs to
name	The logical name of the file
filename	The physical name of the file, including path
state	File state (0 = OFFLINE, 1 = RESTORING, 2 = RECOVERING, 3 = RECOVERY_PENDING, 4 = SUSPECT, 6 = OFFLINE, 7=DEFUNCT)
state_desc	Description of the file state (OFFLINE, RESTORING, RECOVERING, RECOVERY_PENDING, SUSPECT, OFFLINE, DEFUNCT)
size	Current size of the file in 8KB pages
max_size	Maximum file size in 8KB pages
growth	File growth setting (0=fixed, >0=autogrow in units of 8KB pages or by percentage if is_percent_growth is set to 1)
is_media_read_only	1=file is on read-only media
is_read_only	1= file is marked read-only
is_sparse	1=file is a sparse file
is_percent_growth	1=growth of file value is percentage

SQL Server uses the file location information visible in the sys.master_files catalog view most of the time. However, the Database Engine uses the file location information stored in the primary file to initialize the file location entries in the master database when attaching a database using the CREATE DATABASE statement with either the FOR ATTACH or FOR ATTACH_REBUILD_LOG options.

Every database can have three types of files:

Primary data file
Secondary data files
Log files

In addition, in SQL Server 2008, databases can also have FILESTREAM data files and full-text data files.

Primary Data File

Every database has only one primary database file. The location of the primary database file is stored in the master database (visible via the filename column in the sys.master_files view). When SQL Server opens a database, it looks for this file and then reads from the file information on the other files defined for the database.

The file extension for the primary database file defaults to .mdf. The primary database file always belongs to the default filegroup. It is often sufficient to have only one database file for storing your tables and indexes (the primary database file). The file can, of course, be created on a RAID partition to help spread I/O. However, if you need finer control over placement of your tables across disks or disk arrays, or if you want to be able to back up only a portion of your database via filegroups, you can create additional, secondary data files for a database.

Secondary Data Files

A database can have any number of secondary files (in reality, the maximum number of files per database is 32,767, but that should be sufficient for most implementations). You can put a secondary file in the default filegroup or in another filegroup defined for the database. Secondary data files have the file extension .ndf by default.

Following are some situations in which the use of secondary database files might be beneficial:

You want to perform a partial backup. A backup can be performed for the entire database or a subset of the database. The subset is specified as a set of files or filegroups. The partial backup feature is useful for large databases, where it is impractical to back up the entire database. When recovering with partial backups, a transaction log backup must also be available.
You want more control over placement of database objects. When you create a table or index, you can specify the filegroup in which the object is created. This could help you spread I/O by placing your most active tables or indexes on separate filegroups defined on separate disks or disk arrays.
Creating multiple files on a single disk provides no real performance benefit but could help in recovery. If you have a 90GB database in a single file and have to restore it, you need to have enough disk space available to create a new 90GB file. If you don’t have 90GB of space available on a single disk, you cannot restore the database. On the other hand, if the database was created with three files each 30GB in size, you more likely will be able to find three 30GB chunks of space available on your server.

The Log File

Each database must have at least one log file. The log file contains the transaction log records of all changes made in a database . By default, log files have the file extension .ldf.

A database can have several log files, and each log file can have a maximum size of 32TB. A log file cannot be part of a filegroup. No information other than transaction log records can be written to a log file.

File Management

In SQL Server 2008, you can specify that a database file should grow automatically as space is needed. SQL Server can also shrink the size of the database if the space is not needed. You can control whether to use this feature along with the increment by which the file is to be expanded. The increment can be specified as a fixed number of megabytes or as a percentage of the current size of the file. You can also set a limit on the maximum size of the file or allow it to grow until no more space is available on the disk.

Listing 1 provides an example of a database being created with a 10MB growth increment for the first database file, 20MB for the second, and 20% growth increment for the log file.

Listing 1. Creating a Database with Autogrowth

CREATE DATABASE Customer
ON ( NAME='Customer_Data',
    FILENAME='D:\SQL_data\Customer_Data1.mdf',
    SIZE=50,
    MAXSIZE=100,
    FILEGROWTH=10),
   ( NAME='Customer_Data2',
    FILENAME='E:\SQL_data\Customer_Data2.ndf',
    SIZE=100,
    FILEGROWTH=20)
LOG ON ( NAME='Customer_Log',
    FILENAME='F:\SQL_data\Customer_Log.ldf',
    SIZE=50,
    FILEGROWTH=20%)
GO

The Customer_Data file has an initial size of 50MB, a maximum size of 100MB, and a file increment of 10MB.

The Customer_Data2 file has an initial size of 100MB, has a file growth increment of 20MB, and can grow until the E: disk partition is full.

The transaction log has an initial size of 50MB; the file increases by 20% with each file growth. The increment is based on the current file size, not the size originally specified.

When creating or expanding data files in SQL Server 2008, SQL Server uses fast file initialization. This allows for the fast execution of the file creation and growth. With fast file initialization, the space is added to the data file immediately, but without initializing the logical pages in the data file with zeros. The existing disk content in the data file is not overwritten until new data is written to the files. This provides a huge performance advantage when a data file autogrows while an application is attempting to write data to the database. The application does not need to wait until the space is initialized; it can begin writing to the database immediately.

SQL Server also provides an option to autoshrink databases as well as manually shrink databases. However, shrinking a database is a resource-intensive process and should be done only if it is absolutely imperative to reclaim disk space. Also, if a data file is constantly shrinking and growing, it can lead to excessive file fragmentation at the file system level as well as excessive logical fragmentation within the file, both of which can lead to poor I/O performance.